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1.  Estimating  Survey  Costs;  A  Case  Study 

The  objective  of  surveys  is  to  collect  meaningful  data  for  the  analysis 
and  understanding  of  various  aspects  of  human  behavior.  In  recent  years, 
survey  methods  have  been  greatly  improved  for  yielding  more  accurate  and 
reliable  data.  Based  on  survey  data,  empirical  validation  of  an  established 
theory  may  be  obtained,  or  a  new  theory  may  be  postulated.  Conducting  sur- 
veys, hence,  is  vital  for  the  development  of  the  social  sciences  and  related 
disciplines.  This  invaluable  service  is  provided  by  a  number  of  survey  insti- 
tutions in  the  country.  However,  if  these  survey  institutions  are  to  con- 
tinue to  provide  such  vital  services,  it  is  crucial  that  they  estimate  survey 
costs  for  diverse  projects  with  reasonable  accuracy  and  keep  them  under  control 

The  purpose  of  this  study  is  to  systematize  the  budgeting  of  a  survey 
operation  by  representing  the  various  costs  incurred  within  the  framework  of 
an  econometric  model.  Two  alternative  models  are  formulated  in  this  paper, 
as  described  in  Section  3.  As  a  prior  step,  however,  the  next  section  out- 
lines the  nature  of  the  problem  and  presents  the  conceptual  framework  within 
which  the  analysis  is  carried  out.  The  econometric  formulations  are  then 
provided  in  Section  3,  with  empirical  results  obtained  from  testing  these 
models  in  Section  4.  The  goodness  of  fit  of  the  models  as  well  as  their 
predictive  accuracy  as  applied  to  additional  data  are  compared  in  Section  5. 
A  final  section  summarizes  the  findings,  indicates  how  they  may  be  applied 
in  actual  survey  work  and  provides  suggestions  for  future  research. 
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2.  Conceptual  Framework 

The  Survey  Research  Laboratory  CSRL)  of  the  University  of  Illinois, 
established  in  1964 ,  has  conducted  over  175  surveys  for  University  staff, 
students,  public  agencies  and  other  outside  groups. 

The  operations  of  the  Laboratory  are  divided  into  the  following  five 
major  sections;  1)  project  coordination,  2)  sampling,  3)  field,  4)  data 
reduction,  and  5)  data  processing.  A  project  coordinator  is  assigned  to 
each  survey  project.  The  role  of  the  project  coordinator  is  to  consult  with 
the  sponsor  and  work  out  the  details  of  the  survey  design.  He  Cshe)  also 
works  with  the  other  section  heads  on  the  preparation  of  a  detailed  expense 
budget  for  the  survey.  These  people  apply  their  individual  "past  experience" 
in  planning  surveys,  in  deriving  detailed  expense  estimates.  The  budget  is 
used  as  a  basis  not  only  for  the  cost  estimate  of  the  survey  submitted  to 
the  client,  but  for  later  internal  control  of  expenses  incurred  on  the  sur- 
vey. A  brief  description  of  the  cost  ingredients  of  the  operations  of  the 
five  sections  is  essential  for  model  formulation  and  is  provided  in  the  fol- 
lowing paragraph. 

The  costs  classified  under  the  project  coordination  section  are  mainly 
the  project  coordinator's  salary,  traveling  expenses  and  clerical  wages  and 
expenses.  A  common  cost  ingredient  of  the  other  four  sections  is  the  sal- 
aries of  the  respective  section  heads.  In  addition,  sampling  cost  includes 
the  salaries  of  the  sampling  staff  and  material  expenses.  The  field  cost 
is  composed  of  a  large  number  of  cost  items  for  selecting,  hiring  and  train- 
ing  interviewers,  for  the  pretest,  and  for  the  final  collection  of  the  data. 
These  costs  include,  for  example,  salaries  and  wages  of  staff  and  interviewers, 
their  traveling  expenses,  clerical  salaries  and  expenses,  materials  and  postage, 
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cost  of  questionnaire  reproduction,  etc.  The  components  of  data  reduction 
cost  include  salaries  and  wages  of  the  keypunch  staff  as  well  as  the  con- 
trol staff  that  examines  the  quality  of  data,  coding  wages,  plus  clerical 
and  material  expenses  and  machine  rental.  Finally,  the  data  processing  cost 
includes  the  salaries  and  wages  of  the  computer  programmers,  material  expen- 
ses, machine  rental  and  computer  expenses. 

The  data  used  in  this  study  are  obtained  from  SRL  surveys  completed 
between  spring  1970  and  summer  1973.  Earlier  surveys  are  excluded  due  to 
lack  of  representativeness  as  well  as  absence  of  key  data. 

These  surveys  can  be  classified  by  the  method  used  into  five  categories : 
1)  telephone,  2)  mail,  3)  personal  interview,  4)  self-administered,  and  5) 
combination  of  the  previous  categories .   It  is  within  this  framework  that 
econometric  models  for  survey  costs  will  be  formulated. 

3.  The  Models 

The  dependent  variables  are  the  costs  of  the  aforementioned  five  opera- 
tions or  sections.  The  independent  variables  are  characteristics  of  the  sur- 
vey.  For  simplicity,  a  linear  relationship  between  the  two  sets  is  assumed, 
so  that  the  cost  of  a  survey  is  given  by  a  system  of  five  linear  equations. 

Taking  into  consideration  the  five  different  categories  of  surveys, 
two  approaches  will  be  taken  under  different  assumptions: 

Approach  I.   It  is  assumed  that  the  costs  of  the  five  operations  are  from 
distinct  populations  for  different  categories  of  surveys.  This  assumption 
is  substantiated  by  a  discriminant  analysis  of  the  five  section  costs  for 
the  telephone,  mail,  and  personal  interview  surveys  as  three  groups.* 


*The  other  two  categories  of  surveys  are  excluded  from  the  analysis  due 
to  very  small  numhers  of  observations. 
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The  results  gave  a  Rao's  F-ratio  approximation  of  3.72  with  (10,  72)  degrees 
of  freedom,  significant  beyond  the  0.01  level.  Therefore,  the  null  hypothe- 
sis of  equal  group  centroids  on  these  five  cost  variables  is  rejected.  Under 
the  above  assumption,  a  system  of  five  linear  equations  is  used  for  each  of  the 
first  three  categories  of  surveys. 

The  models  for  the  three  categories  of  surveys  are  all  partially  recursive 
and  similar.  It  is  assumed  that  the  cost  of  a  survey  operation  (Y.)  is  influ- 
enced by  a  set  of  survey  characteristics  (Z),  which  are  treated  as  exogenous. 
In  addition,  the  cost  of  project  coordination  (Y.)  is  also  determined  by  the 
cost  of  field  operation  (Y  )  in  the  case  of  telephone  and  personal  interview 
surveys,  or  by  the  cost  of  data  reduction  operation  (Y.)  in  the  case  of  mail 
surveys.  The  choice  is  based  on  the  different  nature  of  operations  in  these 
types  of  surveys.  The  linear  model  is  formulated  as  follows: 

l'   Yl  =  ac  +  al  Y3  +  a2  Y4  +  a3  Zj  +  ^ 

2.  Y  =  b  +  b_  Z.  +  v 

2  o  3  j 

3.  Y  =  c  +  c.  2.  +  \L 

3  o  5      j        ^ 

4.  Y„  =  do  +  d,  Z.  +  6 

4  o  o  j 

5.  Yc  =  en  +  e7  Z.  +  e 

5  o  3  j 

Where  a?  =  0  in  the  case  of  telephone  and  personal  interview  surveys 
a  =  0  in  the  case  of  mail  surveys 
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Approach  II.  It  is  assumed  that  the  costs  of  the  sampling  (Y-),  data 
reduction  (Y.),  and  data  processing  operations  (Y^)  are  from  the  same  popu- 
lation regardless  of  survey  method.  This  assumption  is  also  substantiated  by  a 
discriminant  analysis  of  these  three  cost  variables  for  the  first  three 
categories  of  surveys.  The  results  gave  a  Rao's  F-ratio  approximation  of 
1.63  with  (6,  56)  degrees  of  freedom.  Therefore,  the  hypothesis  of  equal 
group  centroids  on  these  three  cost  variables  is  not  rejected  at  0.05  level 
of  significance.  Under  this  assumption,  the  three  cost  functions  Y2>  Y. 
and  Y5  will  be  estimated  by  using  all  types  of  surveys. 

Both  approaches  are  used,  since  their  underlying  assumptions  are  not 
necessarily  mutually  exclusive.  Single  equation  least-squares  is  used  to 
estimate  the  cost  functions. 

4.  Regression  Results 

The  estimated  functions  for  the  two  models  are  summarized  in  Tables  1-4. 
The  independent  variables  included  in  each  function  is  a  subset  of  variables 
of  a  larger  number  initially  used  for  the  function.  Variables  with 
estimated  coefficients  in  the  wrong  direction  have  generally  been  eliminated. 

Approach  I.   In  the  specification  of  the  cost  function  for  project 
coordination,  field  cost  is  included  as  a  predetermined  variable  in  the  case  of 
telephone  surveys  and  personal  interview  surveys,  and  data  reduction  cost  is 
included  in  the  case  of  mail  surveys.  This  choice  is  based  on  the  different 
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nature  of  operations  in  these  three  types  of  surveys,  as  well  as  the  fact 
that  the  goodness  of  fit  is  surprisingly  high  for  the  field  cost  functions 
on  telephone  and  personal  interview  surveys  and  for  the  data  reduction  func- 
tion on  mail  surveys.  The  chosen  predetermined  variable  is  statistically 
significant  in  each  case. 

On  an  overall  basis,  the  five  cost  functions  on  telephone  surveys  have 
the  best  fit.  The  key  determinants  for  project  coordination  cost  are  field 
cost  and  whether  the  coordinator  wrote  a  report  or  not.  The  key  determinant 
of  sampling  cost,  as  well  as  of  field  cost,  is  sample  size.  The  key  deter- 
minants of  data  reduction  cost  and  of  data  processing  cost  are  total  number  of 
questionnaires  keypunched  and  total  number  of  cards  punched,  respectively. 

The  overall  goodness  of  fit  of  the  cost  functions  on  mail  and  personal 
interview  surveys  shows  a  mixed  picture.   In  both  cases,  the  data  processing 
cost  function  has  the  poorest  fit.  In  addition,  the  sampling  cost  function 
on  mail  surveys  has  a  poor  fit,  mainly  due  to  the  fact  that  the  mailing  list 
is  provided  by  the  client  in  most  cases.  As  a  result,  sample  size,  a  key 
determinant  of  sampling  cost  in  all  the  other  models  in  this  study,  cannot 
be  of  use  in  this  function. 

The  key  determinant  of  project  coordination  cost  on  mail  surveys  is 
data  reduction  cost.  On  the  other  hand,  data  reduction  cost  is  determined 
by  coding  work  needed,  number  of  different  questionnaires  used  in  a  survey, 
and  total  number  of  pages  of  questionnaires  keypunched.  The  field  cost  on 
mail  surveys  is  determined  by  the  positive  effect  of  sample  size,  response 
rate,  number  of  new  interviewers  used  in  the  telephone  follow-up,  and  whether 
thank-you  letters  are  sent. 


It  is  rather  puzzling  that  the  total  number  of  interviewers  used  in 
the  telephone  follow-up  and  conduct  of  a  pretest  have  negative  effects  on 
field  cost.  The  explanation  might  be  that  the  effect  of  the  total  number 
of  interviewers  should  be  considered  together  with  the  number  of  new  inter- 
viewers.  In  addition,  having  conducted  a  pretest  in  a  mail  survey  might 
contribute  to  efficiency  in  later  field  operations. 

Finally,  the  regression  results  on  personal  interview  surveys  are  con- 
sidered. The  key  determinants  of  project  coordination  cost  are  field  cost 
and  number  of  months  spent  on  the  survey.   It  is  noted  that  whether  the  list 
of  sampled  units  is  prepared  by  SRL,  a  variable  significant  at  0,05  level, 
bears  a  negative  sign  in  the  sampling  cost  function.  However,  no  explanation 
is  attempted,  since  the  function  itself  is  not  significant  at  0.05  level. 
Furthermore,  estimated  coefficients  for  the  field  cost  as  well  as  data  reduc- 
tion cost  present  a  mixed  picture.  The  most  puzzling  result  is  that  the 
number  of  responses,  a  variable  significant  at  the  0.01  level,  has  a  negative 
effect  on  field  cost,  as  do  thank-you  letters  sent  and  number  of  months  spent 
on  the  survey.   It  is  quite  acceptable  that  the  key  variables  with  positive 
effects  on  field  cost  are  number  of  interviewers,  number  of  new  interviewers, 
response  rate  and  advance  letters  being  sent.  Similarly,  data  reduction  cost 
is  determined  by  the  positive  effects  of  total  number  of  questionnaires  key- 
punched and  average  number  of  cards  per  questionnaire  and  by  the  negative 
effects  of  average  number  of  pages  per  questionnaire  and  coding  work  needed. 
No  interpretation  of  such  mixed  effects  is  attempted  at  this  time. 

Approach  II.   In  addition  to  the  three  types  of  surveys  used  in  Approach 
I,  the  few  observations  on  self-administered  surveys  and  combination  surveys 
are  also  included  in  the  estimation  of  the  three  cost  functions  in  Table  4. 
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Table  1.   Regression  Results  on  Telephone  Survey 


Cost  function 


Variable 

Field  cost 

Coordinator  wrote  report 

No.  of  months  on  survey 

Coordinator's  salary  rate 

List  made  by  SRL 

Sample  size 

Sent  advance  letters 

No.  of  drafts  of  questionnaire 

No.  of  interviewers 

Location  of  interviewers 

Location  of  reduction  work 

Total  no.  of  questionnaires  punched 

Total  no.  of  cards  punched 

Data  cleaning  done 

Frequency  tabulation  run 

Other  computer  analysis  run 


Project 
Coordination 

0.69** 
0.35** 
0.06 
0.08 


Sampling  Field 


0.11 
0.84** 


0.75** 

0.08 

0.15* 

0.19* 

0.10* 


Data 
Reduction 


Data 
Processing 


0.06 
0.94** 


0.85** 
0.20 
0.02 
0.15 


Adj  R' 


0.88' 


0.60 


** 


0.97 


** 


0.93* 


0.87 


** 


♦Significant  at  0.05  level 
♦♦Significant  at  0.01  level 
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Table  2.  Regression  Results  on  Mail  Survey- 


Cost  function 


Variable 

Data  reduction  cost 

Coordinator  wrote  report 

No.  of  months  on  survey 

Sampling  method 

List  made  by  SRL 

No.  of  populations 

Sent  thank  you  letters 

Pre-test  done 

Sample  size 

Total  no.  of  interviewers 

No.  of  new  interviewers 

Response  rate 

Coding  work  needed 

No.  of  different  questionnaires 

Total  pages  punched 

Total  no.  of  questionnaires  punched 

Data  cleaning  done 

Other  computer  analysis  run 


Project 
Coordination 

0.57** 

0.09 

0.42* 


Sampling  Field 


0.06 
0.34 
0.71 


Data 
Reduction 


Data 
Processing 


0.111 


0.51** 
-0.15* 
0.50** 
■0.49** 
0.95** 
0.20** 


0.22** 
0.27** 
0.80** 


0.45 
0.36 
0.59 


Adj 


R2 


0.71 


** 


0.67 


0.97' 


0.93' 


0.06 


♦Significant  at  0.05  level 
**Significant  at  0.01  level 
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Table  3.   Regression  Results  on  Personal  Interview  Survey- 


Cost  function 


Variable 


Project 
Coordination 


Sampling  Field 


Field  cost  0.42* 

No.  of  months  on  survey  0.66** 

List  made  by  SRL 

Sample  size 

Sent  advance  letters 

Sent  thank  you  letters 

No.  of  responses 

No.  of  interviewers 

No.  of  new  interviewers 

Response  rate 

Coding  work  needed 

Total  no.  of  questionnaires  punched 

Ave.  no.  of  pages  per  questionnaire 

Ave.  percent  of  unstructured  questionnaire 

Ave.  no.  of  cards  per  questionnaire 

Total  no.  of  cards  punched 

Frequency  tabulation  run 

Other  computer  analysis  run 

Adj  R2  0.55* 


-0.78** 
0.88** 


Data 
Reduction 


Data 
Processing 


■0.21 


1.20** 
-0.67** 
-4.11** 
3.65** 
2.88** 
1.58** 


-0.68* 
0.65** 

-0.83** 
0.39 
1.43** 


0.54 


0.95 


** 


0.83** 


0.53 
0.48 
0.41 

0.24 


*Significant  at:  0.05  level 
**Significant  at  0.01  level 
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Table  4.  Regression  Results  on.  all  Types  of  Surveys 


Cost  function 


Variable 

Sampling  method 

Sample  size 

Coding  work  needed 

No.  of  different  questionnaires 

Total  no.  of  questionnaires  punched 

Ave.  no.  of  cards  per  questionnaire 

Total  no.  of  cards  punched 

Data  cleaning  done 

Frequency  tabulation  run 

Other  computer  analysis  run 


Data 

Data 

Sampling 

Reduction 

Processing 

0.16 

0.50** 

0.13 
0.10 
0.72** 
0.38** 

0.55** 
0.17 
0.16 
0.28* 


Adj  R 


,2 


0.25 


** 


0.75** 


0.38 


** 


*Significant  at  0.05  level 
**Significant  at  0.01  level 
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The  results  support  the  findings  by  Approach  I.  Again  sample  size  is  the 
key  determinant  of  sampling  cost.  Also,  the  key  variables  for  data  reduc- 
tion cost  are  total  number  of  questionnaires  keypunched  and  average  number 
of  cards  per  questionnaire.  Furthermore,  the  key  variables  for  data  processing 
cost  are  total  number  of  cards  keypunched  and  whether  other  computer  analysis 
is  done.  The  overall  goodness  of  fit  of  the  sampling  cost  function  and  data 
processing  function  is  much  improved,  compared  with  that  obtained  by  Approach 
I  for  mail  and  personal  interview  surveys. 

Overall,  it  is  felt  that  a  better  prediction  of  sampling  and  data  proces- 
sing costs  could  be  made  by  using  the  functions  estimated  by  Approach  II,  at 
least  for  mail  and  personal  interview  surveys,  on  which  the  functions  esti- 
mated by  Approach  I  are  not  statistically  significant. 

5.  Summary  and  Conclusions 

A  partially  recursive  system  of  equations  for  the  costs  of  five  survey 
operations  is  estimated  by  the  single  equation  least  squares  method.   In  using 
this  budget  system,  only  the  few  independent  variables  included  in  the  cost 
functions  have  to  be  predicted  to  arrive  at  four  cost  estimates.  The 
estimated  field  cost  or  data  reduction  cost  will  in  turn  be  used  to  estimate 
the  cost  of  project  coordination.  Due  to  the  small  sample  size  used  in  this 
study  only  a  few  quantitive  or  qualitative  variables  are  used  in  each  cost 
function.  Consequently,  the  estimated  cost  of  a  survey  operation  might  be 
sensitive  to  the  presence  or  absence  of  a  survey  characteristic  measured  by 
a  qualitative  variable. 
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Nevertheless,  for  the  time  being,  it  is  proposed  that  this  budget  system 
be  maintained  in  parallel  with  the  existing  old  budget  procedure,  so  that 
the  cost  estimates  for  a  new  survey  project  provided  by  both  systems  can  be 
compared.  However,  the  value  of  this  budget  system  lies  in  not  only  saving  of 
human  effort  but  better  accuracy  of  cost  estimates.  The  prediction  power  of 
this  system  should  be  measured  by  its  performance  on  future  surveys.   It  is 
planned  to  use  these  budget  functions  to  estimate  costs  of  SRL  surveys  com- 
pleted in  1974.  Also,  error  measures  will  be  computed,  namely,  the  average 
absolute  error  and  the  Theil  U  statistic.  Furthermore,  these  new  surveys 
will  in  turn  be  included  in  the  sample  for  estimating  a  new  system  of  cost 
functions  which  can  include  more  variables  and  thus  generate  less  sensitive 
predictions.   It  is  hoped  that  this  iterative  budget  system  will  eventually 
attain  the  goal  of  providing  speedy  cost  estimates  of  a  survey  project  with 
respectable  accuracy. 


